An exploration of the capabilities and applications of diffusion models
In this part, I gained access to DeepFloyd, obtained some pre-embedded text prompts, and downloaded the precomputed text embeddings. For the three text prompts provided, I displayed the caption and the output of the model with num_inference_steps
values equal to 10, 20, and 40. Here are the results:
From the results, we can see that with 10 steps the image is noisy, and there are not any obvious differences between 20 and 40. The random seed used is 180.
First, I implemented the forward process, which is the forward(im, t)
function (adds random noise to the image at different levels). The test image at noise levels 250, 500, and 750 are shown below:
Then, I tried using Gaussian blur filtering to remove the noise. The result is shown below:
I asked the model to implement one-step denoising on the test image with noise levels at 250, 500, and 750. The results are shown below:
I implemented denoising with multiple steps. At each step, the model estimates the noise and the original image, allowing us to iteratively obtain a more accurate denoised image. The result is shown below:
I used the model to generate images from scratch. Here are five images generated:
To improve the images generated by the model, I applied Classifier-Free Guidance (CFG). Instead of using the noise predicted by the model directly at each step, I adjusted it by adding the difference from the unconditional prompt noise. The results are shown below:
Interestingly, the results of CFG are mostly human faces, possibly because the training data predominantly features human faces.
I took the test image, added noise to it, and then denoised it. I experimented with different extents of noise, setting i_start
to 1, 3, 5, 7, 10, and 20. The results are shown below:
I also applied this method to my own images. The results are shown below:
Besides the test image and my own images, I applied the method to a web image and two hand-drawn images. The results are shown below:
I used the same procedure to implement inpainting, following the RePaint paper. Given an image and a binary mask m
, we can create a new image that retains the original content where m
is 0, and generates new content where m
is 1. The results are shown below, including the test image and two of my own images:
I applied SDEdit, guiding the projection with a text prompt. The test image was prompted with "a photo of a rocket". The result is shown below:
Applied to my own images:
I created optical illusions using diffusion models. In this part, I generated an image that looks like "an oil painting of people around a campfire", but when flipped upside down reveals "an oil painting of an old man". The result is shown below:
Other results are shown below:
In this part, I implemented Factorized Diffusion and created hybrid images, similar to those in Project 2. The results are shown below: